The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
配备高速数字化器的前端电子设备正在使用并建议将来的核检测器。最近的文献表明,在处理来自核检测器的数字信号时,深度学习模型,尤其是一维卷积神经网络。模拟和实验证明了该领域神经网络的令人满意的准确性和其他好处。但是,仍需要研究特定的硬件加速在线操作。在这项工作中,我们介绍了Pulsedl-II,这是一种专门设计的,专门为事件功能(时间,能量等)从具有深度学习的脉冲中提取的应用。根据先前的版本,PULSEDL-II将RISC CPU纳入系统结构,以更好地功能灵活性和完整性。 SOC中的神经网络加速器采用三级(算术单元,处理元件,神经网络)层次结构,并促进数字设计的参数优化。此外,我们设计了一种量化方案和相关的实现方法(恢复和位移位),以在所选层类型的选定子集中与深度学习框架(例如Tensorflow)完全兼容。通过当前方案,支持神经网络的量化训练,并通过专用脚本自动将网络模型转换为RISC CPU软件,几乎没有准确性损失。我们在现场可编程门阵列(FPGA)上验证pulsedl-ii。最后,通过由直接数字合成(DDS)信号发生器和带有模数转换器(ADC)的FPGA开发板组成的实验设置进行系统验证。拟议的系统实现了60 PS的时间分辨率和0.40%的能量分辨率,在线神经网络推断在信号与噪声比(SNR)为47.4 dB时。
translated by 谷歌翻译
6-DOF GRASP姿势检测多盖和多对象是智能机器人领域的挑战任务。为了模仿人类的推理能力来抓住对象,广泛研究了数据驱动的方法。随着大规模数据集的引入,我们发现单个物理度量通常会产生几个离散水平的掌握置信分数,这无法很好地区分数百万的掌握姿势并导致不准确的预测结果。在本文中,我们提出了一个混合物理指标来解决此评估不足。首先,我们定义一个新的度量标准是基于力闭合度量的,并通过对象平坦,重力和碰撞的测量来补充。其次,我们利用这种混合物理指标来产生精致的置信度评分。第三,为了有效地学习新的置信度得分,我们设计了一个称为平面重力碰撞抓氏(FGC-Graspnet)的多分辨率网络。 FGC-GRASPNET提出了多个任务的多分辨率特征学习体系结构,并引入了新的关节损失函数,从而增强了GRASP检测的平均精度。网络评估和足够的实际机器人实验证明了我们混合物理指标和FGC-GraspNet的有效性。我们的方法在现实世界中混乱的场景中达到了90.5 \%的成功率。我们的代码可在https://github.com/luyh20/fgc-graspnet上找到。
translated by 谷歌翻译
高光谱成像是一种重要的传感技术,具有广泛的应用和环境科学,天气和地理/空间探索的地区的影响。高光谱图像(HSI)处理的一个重要任务是频谱空间特征的提取。利用多层网络(M-GSP)的最近开发的曲线图信号处理,这项工作提出了基于M-GSP特征提取的几种方法对HSI分段的方法。为了捕获联合光谱空间信息,我们首先为HSI定制一个基于张力的多层网络(MLN)模型,并为特征提取定义MLN奇异空间。然后,我们通过利用MLN谱聚类来开发无监督的HSI分段方法。通过MLN的聚类重新组合HSI像素,我们进一步提出了一种基于Superpixels的多分辨率融合的半监控HSI分类。我们的实验结果表明了HSI处理中M-GSP的强度和光谱 - 空间信息提取。
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译
Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
translated by 谷歌翻译
Recently, great progress has been made in single-image super-resolution (SISR) based on deep learning technology. However, the existing methods usually require a large computational cost. Meanwhile, the activation function will cause some features of the intermediate layer to be lost. Therefore, it is a challenge to make the model lightweight while reducing the impact of intermediate feature loss on the reconstruction quality. In this paper, we propose a Feature Interaction Weighted Hybrid Network (FIWHN) to alleviate the above problem. Specifically, FIWHN consists of a series of novel Wide-residual Distillation Interaction Blocks (WDIB) as the backbone, where every third WDIBs form a Feature shuffle Weighted Group (FSWG) by mutual information mixing and fusion. In addition, to mitigate the adverse effects of intermediate feature loss on the reconstruction results, we introduced a well-designed Wide Convolutional Residual Weighting (WCRW) and Wide Identical Residual Weighting (WIRW) units in WDIB, and effectively cross-fused features of different finenesses through a Wide-residual Distillation Connection (WRDC) framework and a Self-Calibrating Fusion (SCF) unit. Finally, to complement the global features lacking in the CNN model, we introduced the Transformer into our model and explored a new way of combining the CNN and Transformer. Extensive quantitative and qualitative experiments on low-level and high-level tasks show that our proposed FIWHN can achieve a good balance between performance and efficiency, and is more conducive to downstream tasks to solve problems in low-pixel scenarios.
translated by 谷歌翻译
Rigorous guarantees about the performance of predictive algorithms are necessary in order to ensure their responsible use. Previous work has largely focused on bounding the expected loss of a predictor, but this is not sufficient in many risk-sensitive applications where the distribution of errors is important. In this work, we propose a flexible framework to produce a family of bounds on quantiles of the loss distribution incurred by a predictor. Our method takes advantage of the order statistics of the observed loss values rather than relying on the sample mean alone. We show that a quantile is an informative way of quantifying predictive performance, and that our framework applies to a variety of quantile-based metrics, each targeting important subsets of the data distribution. We analyze the theoretical properties of our proposed method and demonstrate its ability to rigorously control loss quantiles on several real-world datasets.
translated by 谷歌翻译
Recently, large-scale pre-trained models have shown their advantages in many tasks. However, due to the huge computational complexity and storage requirements, it is challenging to apply the large-scale model to real scenes. A common solution is knowledge distillation which regards the large-scale model as a teacher model and helps to train a small student model to obtain a competitive performance. Cross-task Knowledge distillation expands the application scenarios of the large-scale pre-trained model. Existing knowledge distillation works focus on directly mimicking the final prediction or the intermediate layers of the teacher model, which represent the global-level characteristics and are task-specific. To alleviate the constraint of different label spaces, capturing invariant intrinsic local object characteristics (such as the shape characteristics of the leg and tail of the cattle and horse) plays a key role. Considering the complexity and variability of real scene tasks, we propose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approach to transfer the intrinsic local-level object knowledge of a large-scale teacher network to various task scenarios. First, to better transfer the generalized knowledge in the teacher model in cross-task scenarios, we propose a prototype learning module to learn from the essential feature representation of objects in the teacher model. Secondly, for diverse downstream tasks, we propose a task-adaptive feature augmentation module to enhance the features of the student model with the learned generalization prototype features and guide the training of the student model to improve its generalization ability. The experimental results on various visual tasks demonstrate the effectiveness of our approach for large-scale model cross-task knowledge distillation scenes.
translated by 谷歌翻译